Search CORE

71 research outputs found

Efficient Exact Inference in Planar Ising Models

Author: Kamenetsky Dmitry
Schraudolph Nicol N.
Publication venue
Publication date: 01/01/2008
Field of study

We give polynomial-time algorithms for the exact computation of lowest-energy (ground) states, worst margin violators, log partition functions, and marginal edge probabilities in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings (dimer coverings) in an expanded dual graph. We implement a unified framework while delegating complex but well-understood subproblems (planar embedding, maximum-weight perfect matching) to established algorithms for which efficient implementations are freely available. Unlike graph cut methods, we can perform penalized maximum-likelihood as well as maximum-margin parameter estimation in the associated conditional random fields (CRFs), and employ marginal posterior probabilities as well as maximum a posteriori (MAP) states for prediction. Maximum-margin CRF parameter estimation on image denoising and segmentation problems shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/Comment: Fixed a number of bugs in v1; added 10 pages of additional figures, explanations, proofs, and experiment

arXiv.org e-Print Archive

CiteSeerX

Graph Kernels

Author: Borgwardt Karsten M.
Kondor Risi
Schraudolph Nicol N.
Vishwanathan S. V. N.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2010
Field of study

We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexity of kernel computation between unlabeled graphs with n vertices from O(n^6) to O(n^3). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn^3) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n^4) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n^2) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment kernel of Fröhlich et al. (2006) yet provably positive semi-definite

Caltech Authors

MPG.PuRe

Dynamic Parameter Encoding for genetic algorithms

Author: C.G. Shaefer
D. Whitley
D.E. Goldberg
H.P. Schwefel
J.D. Schaffer
J.H. Holland
J.J. Grefenstette
K. Deb
K.A. Jong De
Nicol N. Schraudolph
R.A. Caruana
Richard K. Belew
V.R. Mandava
W.E. Hart
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Centering Neural Network Gradient Factors

Author: Nicol N. Schraudolph
Publication venue: Springer Verlag
Publication date: 01/01/1998
Field of study

It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [2]. Here we generalize this notion to all factors involved in the network's gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network's generalization ability

CiteSeerX

Accelerated gradient descent by factor-centering decomposition

Author: Nicol N. Schraudolph
Publication venue
Publication date: 01/01/1998
Field of study

Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network’s gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that shortcut connections — a well-known architectural feature — should work best in conjunction with slope centering, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factorcentering decomposition can speed up learning significantly without adversely affecting the trained network’s generalization ability. 1

CiteSeerX

Repository for Publications and Research Data

Local Gain Adaptation in Stochastic Gradient Descent

Author: Nicol N. Schraudolph
Publication venue
Publication date: 01/01/1999
Field of study

Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with noni. i.d. sampling of the input space

CiteSeerX

Crossref